Search CORE

428 research outputs found

Dictionary matching in a stream

Author: A.V. Aho
A.Z. Broder
D. Breslauer
D. Breslauer
D.E. Knuth
M. Crochemore
M. Ružić
R. Clifford
R. Clifford
R. Clifford
R.M. Karp
Publication venue
Publication date: 01/01/2015
Field of study

We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Efficient comparison based string matching

Author: Breslauer D. (Dany)
Galil Z.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/1993
Field of study

CWI's Institutional Repository

Online Detection of Repetitions with Backtracking

Author: A Apostolico
D Breslauer
D Breslauer
H Leung
J Jansson
JJ Hong
M Crochemore
MG Main
Z Galil
Publication venue
Publication date: 01/01/2015
Field of study

In this paper we present two algorithms for the following problem: given a string and a rational

e > 1

, detect in the online fashion the earliest occurrence of a repetition of exponent

\ge e

in the string. 1. The first algorithm supports the backtrack operation removing the last letter of the input string. This solution runs in

O(n\log m)

time and

O(m)

space, where

m

is the maximal length of a string generated during the execution of a given sequence of

n

read and backtrack operations. 2. The second algorithm works in

O(n\log\sigma)

time and

O(n)

space, where

n

is the length of the input string and

\sigma

is the number of distinct letters. This algorithm is relatively simple and requires much less memory than the previously known solution with the same working time and space. a string generated during the execution of a given sequence of

n

read and backtrack operations.Comment: 12 pages, 5 figures, accepted to CPM 201

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Ligand-induced formation of nucleic acid triple helices.

Author: D. S. Pilch
K. J. Breslauer
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date
Field of study

Crossref

Tight comparison bounds for the string prefix-matching problem

Author: Breslauer D. (Dany)
Colussi L. (Livio)
Toniolo L. (Laura)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1993
Field of study

CWI's Institutional Repository

A Very Large Array Search for 5 GHz Radio Transients and Variables at Low Galactic Latitudes

Author: Breslauer B.
Chandra P.
Frail D. A.
Gal-Yam A.
Gehrels N.
Kasliwal M. M.
Kulkarni S. R.
Ofek E. O.
Publication venue: 'American Astronomical Society'
Publication date: 20/10/2011
Field of study

We present the results of a 5 GHz survey with the Very Large Array (VLA) and the expanded VLA, designed to search for short-lived (≾1 day) transients and to characterize the variability of radio sources at milli-Jansky levels. A total sky area of 2.66 deg^2, spread over 141 fields at low Galactic latitudes (b≅6-8 deg), was observed 16 times with a cadence that was chosen to sample timescales of days, months, and years. Most of the data were reduced, analyzed, and searched for transients in near real-time. Interesting candidates were followed up using visible light telescopes (typical delays of 1-2 hr) and the X-ray Telescope on board the Swift satellite. The final processing of the data revealed a single possible transient with a peak flux density of f_ν≅2.4 mJy. This implies a transient's sky surface density of κ(f_ν > 1.8 mJy) = 0.039^(+0.13 +0.18)_(–0.032,–0.038) deg^(–2) (1σ, 2σ confidence errors). This areal density is roughly consistent with the sky surface density of transients from the Bower et al. survey extrapolated to 1.8 mJy. Our observed transient areal density is consistent with a neutron star's origin for these events. Furthermore, we use the data to measure the source variability on timescales of days to years, and we present the variability structure function of 5 GHz sources. The mean structure function shows a fast increase on ≈1 day timescale, followed by a slower increase on timescales of up to 10 days. On timescales between 10 and 60 days, the structure function is roughly constant. We find that ≳30% of the unresolved sources brighter than 1.8 mJy are variables at the >4σ confidence level, presumably mainly due to refractive scintillation

NASA Technical Reports Server

Caltech Authors

Fast Algorithm for Partial Covers in Words

Author: A. Apostolico
A. Apostolico
A. Apostolico
A.S. Fraenkel
D. Breslauer
D. Gusfield
D. Moore
E. Ukkonen
G.S. Brodal
G.S. Brodal
J.S. Sim
M. Crochemore
M.R. Brown
Y. Li
Publication venue
Publication date: 01/01/2013
Field of study

A factor

u

of a word

w

is a cover of

w

if every position in

w

lies within some occurrence of

u

w

. A word

w

covered by

u

thus generalizes the idea of a repetition, that is, a word composed of exact concatenations of

u

. In this article we introduce a new notion of

\alpha

-partial cover, which can be viewed as a relaxed variant of cover, that is, a factor covering at least

\alpha

positions in

w

. We develop a data structure of

O(n)

size (where

n=|w|

) that can be constructed in

O(n\log n)

time which we apply to compute all shortest

\alpha

-partial covers for a given

\alpha

. We also employ it for an

O(n\log n)

-time algorithm computing a shortest

\alpha

-partial cover for each

\alpha=1,2,\ldots,n

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

King's Research Portal

Efficient Seeds Computation Revisited

Author: A. Apostolico
C.S. Iliopoulos
D. Breslauer
G.S. Brodal
J. Fischer
K. Sadakane
M. Crochemore
M. Crochemore
M. Crochemore
O. Berkman
Y. Li
Publication venue
Publication date: 01/01/2011
Field of study

The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In the paper we give linear time algorithms for some of its versions --- computing shortest left-seed array, longest left-seed array and checking for seeds of a given length. The algorithm for the last problem is used to compute the seed array of a string (i.e., the shortest seeds for all the prefixes of the string) in

O(n^2)

time. We describe also a simpler alternative algorithm computing efficiently the shortest seeds. As a by-product we obtain an

O(n\log{(n/m)})

time algorithm checking if the shortest seed has length at least

m

and finding the corresponding seed. We also correct some important details missing in the previously known shortest-seed algorithm (Iliopoulos et al., 1996).Comment: 14 pages, accepted to CPM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Covering Problems for Partial Words and for Indeterminate Strings

Author: A Apostolico
A Apostolico
A Kalai
CS Iliopoulos
CS Iliopoulos
D Breslauer
D Lokshtanov
D Moore
J Holub
KR Abrahamson
MF Bari
MJ Fischer
P Antoniou
R Impagliazzo
R Impagliazzo
T Kociumaka
WF Smyth
Y Li
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove that indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to

k

, the number of non-solid symbols. For the indeterminate string covering problem we obtain a

2^{O(k \log k)} + n k^{O(1)}

-time algorithm. For the partial word covering problem we obtain a

2^{O(\sqrt{k}\log k)} + nk^{O(1)}

-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no

2^{o(\sqrt{k})} n^{O(1)}

-time solution exists for either problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared at ISAAC 2014; 14 pages, 4 figure

arXiv.org e-Print Archive

Crossref

King's Research Portal

Palindromic Decompositions with Gaps and Errors

Author: A Apostolico
A Frid
D Breslauer
D Gusfield
D Kosolobov
DE Knuth
G Fici
G Manacher
M Crochemore
M Crochemore
M Rubinchik
R Kolpakov
S Gupta
T I
X Droubay
X Droubay
Y Fujishige
Z Galil
Publication venue
Publication date: 27/03/2017
Field of study

Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decompositions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes. We first present an algorithm for obtaining a palindromic decomposition of a string of length n with the minimal total gap length in time O(n log n * g) and space O(n g), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal \delta-palindromes (i.e. palindromes with \delta errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time O(n (g + \delta)) and space O(n g).Comment: accepted to CSR 201

arXiv.org e-Print Archive

Crossref